14 research outputs found
Optimal utility and probability functions for agents with finite computational precision
When making economic choices, such as those between goods or gambles, humans act as if their internal representation of the value and probability of a prospect is distorted away from its true value. These distortions give rise to decisions which apparently fail to maximize reward, and preferences that reverse without reason. Why would humans have evolved to encode value and probability in a distorted fashion, in the face of selective pressure for reward-maximizing choices? Here, we show that under the simple assumption that humans make decisions with finite computational precision––in other words, that decisions are irreducibly corrupted by noise––the distortions of value and probability displayed by humans are approximately optimal in that they maximize reward and minimize uncertainty. In two empirical studies, we manipulate factors that change the reward-maximizing form of distortion, and find that in each case, humans adapt optimally to the manipulation. This work suggests an answer to the longstanding question of why humans make “irrational” economic choices
Recommended from our members
Human value learning and representation reflects rational adaptation to task demands
Humans and other animals routinely make choices between goods of different value. Choices are often made within identifiable contexts, such that an efficient learner may represent values relative to their local context. However, if goods occur across multiple contexts, a relative value code can lead to irrational choice. In this case, an absolute context-independent value is preferable to a relative code. Here, we test the hypothesis that value representation is not fixed, but rationally adapted to context expectations. In two experiments, we manipulated participants‟ expectations about whether item values learned within local contexts would need to be subsequently compared across contexts. Despite identical learning experiences, the group whose expectations included choices across local contexts, went on to learn more absolute-like representation than the group whose expectations only covered fixed local contexts. Thus, human value representation is neither relative nor absolute, but efficiently and rationally tuned to task demands
Where does value come from?
The computational framework of reinforcement learning (RL) has allowed us to both understand biological brains and build successful artificial agents. However, in this opinion, we highlight open challenges for RL as a model of animal behaviour in natural environments. We ask how the external reward function is designed for biological systems, and how we can account for the context sensitivity of valuation. We summarise both old and new theories proposing that animals track current and desired internal states and seek to minimise the distance to a goal across multiple value dimensions. We suggest that this framework readily accounts for canonical phenomena observed in the fields of psychology, behavioural ecology, and economics, and recent findings from brain-imaging studies of value-guided decision-making
Where does value come from?
The computational framework of reinforcement learning (RL) has allowed us to both understand biological brains and build successful artificial agents. However, in this opinion, we highlight open challenges for RL as a model of animal behaviour in natural environments. We ask how the external reward function is designed for biological systems, and how we can account for the context sensitivity of valuation. We summarise both old and new theories proposing that animals track current and desired internal states and seek to minimise the distance to a goal across multiple value dimensions. We suggest that this framework readily accounts for canonical phenomena observed in the fields of psychology, behavioural ecology, and economics, and recent findings from brain-imaging studies of value-guided decision-making
Ventromedial prefrontal cortex encodes a latent estimate of cumulative reward.
Humans and other animals accumulate resources, or wealth, by making successive risky decisions. If and how risk attitudes vary with wealth remains an open question. Here humans accumulated reward by accepting or rejecting successive monetary gambles within arbitrarily defined temporal contexts. Risk preferences changed substantially toward risk aversion as reward accumulated within a context, and blood oxygen level dependent (BOLD) signals in the ventromedial prefrontal cortex (PFC) tracked the latent growth of cumulative economic outcomes. Risky behavior was captured by a computational model in which reward prompts an adaptive update to the function that links utilities to choices. These findings can be understood if humans have evolved economic decision policies that fail to maximize overall expected value but reduce variance in cumulative outcomes, thereby ensuring that resources remain above a critical survival threshold
Model sharing in the human medial temporal lobe
Effective planning involves knowing where different actions take us. However natural environments are rich and complex, leading to an exponential increase in memory demand as a plan grows in depth. One potential solution is to filter out features of the environment irrelevant to the task at hand. This enables a shared model of transition dynamics to be used for planning over a range of different input features. Here, we asked human participants (13 male, 16, female) to perform a sequential decision-making task, designed so that knowledge should be integrated independently of the input features (visual cues) present in one case but not in another. Participants efficiently switched between using a low (cue independent) and a high (cue specific) dimensional representation of state transitions. fMRI data identified the medial temporal lobe as a locus for learning state transitions. Within this region, multivariate patterns of BOLD responses as state associations changed (via trial-by-trial learning) were less correlated between trials with differing input features in the high compared to the low dimensional case, suggesting that these patterns switched between separable (specific to input features) and shared (invariant to input features) transition models. Finally, we show that transition models are updated more strongly following the receipt of positive compared to negative outcomes, a finding that challenges conventional theories of planning. Together, these findings propose a computational and neural account of how information relevant for planning can be shared and segmented in response to the vast array of contextual features we encounter in our world.<b>SIGNIFICANCE STATEMENT:</b>Effective planning involves maintaining an accurate model of which actions take us to which locations. But in a world awash with information, mapping actions to states with the right level of complexity is critical. Using a new decision-making "heist task" in conjunction with computational modelling and fMRI we show that patterns of BOLD responses in the medial temporal lobe - a brain region key for prospective planning - become less sensitive to the presence of visual features when these are irrelevant to the task at hand. By flexibly adapting the complexity of task state representations in this way, state-action mappings learned under one set of features can be used to plan in the presence of others
Ventromedial prefrontal cortex encodes a latent estimate of cumulative reward.
Humans and other animals accumulate resources, or wealth, by making successive risky decisions. If and how risk attitudes vary with wealth remains an open question. Here humans accumulated reward by accepting or rejecting successive monetary gambles within arbitrarily defined temporal contexts. Risk preferences changed substantially toward risk aversion as reward accumulated within a context, and blood oxygen level dependent (BOLD) signals in the ventromedial prefrontal cortex (PFC) tracked the latent growth of cumulative economic outcomes. Risky behavior was captured by a computational model in which reward prompts an adaptive update to the function that links utilities to choices. These findings can be understood if humans have evolved economic decision policies that fail to maximize overall expected value but reduce variance in cumulative outcomes, thereby ensuring that resources remain above a critical survival threshold
Recommended from our members
Human value learning and representation reflect rational adaptation to task demands
Humans and other animals routinely make choices between goods of different values. Choices are often made within identifiable contexts, such that an efficient learner may represent values relative to their local context. However, if goods occur across multiple contexts, a relative value code can lead to irrational choices. In this case, an absolute context-independent value is preferable to a relative code. Here we test the hypothesis that value representation is not fixed but rationally adapted to context expectations. In two experiments, we manipulated participants’ expectations about whether item values learned within local contexts would need to be subsequently compared across contexts. Despite identical learning experiences, the group whose expectations included choices across local contexts went on to learn more absolute-like representation than the group whose expectations covered only fixed local contexts. Human value representation is thus neither relative nor absolute but efficiently and rationally tuned to task demands
Training discrimination diminishes maladaptive avoidance of innocuous stimuli in a fear conditioning paradigm
Anxiety disorders are the most common mental disorder worldwide. Although anxiety disorders differ in the nature of feared objects or situations, they share a common mechanism by which fear generalizes to related but innocuous objects, eliciting avoidance of objects and situations that pose no objective risk. This overgeneralization appears to be a crucial mechanism in the persistence of anxiety psychopathology. In this study we test whether an intervention that promotes discrimination learning reduces generalization of fear, in particular, harm expectancy and avoidance compared to an irrelevant (control) training. Healthy participants (N = 80) were randomly allocated to a training condition. Using a fear conditioning paradigm, participants first learned visual danger and safety signals (set 1). Baseline level of stimulus generalization was tested with ambiguous stimuli on a spectrum between the danger and safety signals. There were no differences between the training groups. Participants then received the stimulus discrimination training or a control training. After training, participants learned a new set of danger and safety signals (set 2), and the level of harm expectancy generalization and behavioural avoidance of ambiguous stimuli was tested. Although the training groups did not differ in fear generalization on a cognitive level (harm expectancy), the results showed a different pattern of avoidance of ambiguous stimuli, with the discrimination training group showing less avoidance of stimuli that resembled the safety signals. These results support the potential of interventions that promote discrimination learning in the treatment of anxiety disorders
Training discrimination diminishes maladaptive avoidance of innocuous stimuli in a fear conditioning paradigm
Anxiety disorders are the most common mental disorder worldwide. Although anxiety disorders differ in the nature of feared objects or situations, they share a common mechanism by which fear generalizes to related but innocuous objects, eliciting avoidance of objects and situations that pose no objective risk. This overgeneralization appears to be a crucial mechanism in the persistence of anxiety psychopathology. In this study we test whether an intervention that promotes discrimination learning reduces generalization of fear, in particular, harm expectancy and avoidance compared to an irrelevant (control) training. Healthy participants (N = 80) were randomly allocated to a training condition. Using a fear conditioning paradigm, participants first learned visual danger and safety signals (set 1). Baseline level of stimulus generalization was tested with ambiguous stimuli on a spectrum between the danger and safety signals. There were no differences between the training groups. Participants then received the stimulus discrimination training or a control training. After training, participants learned a new set of danger and safety signals (set 2), and the level of harm expectancy generalization and behavioural avoidance of ambiguous stimuli was tested. Although the training groups did not differ in fear generalization on a cognitive level (harm expectancy), the results showed a different pattern of avoidance of ambiguous stimuli, with the discrimination training group showing less avoidance of stimuli that resembled the safety signals. These results support the potential of interventions that promote discrimination learning in the treatment of anxiety disorders